Inducing Taxonomy from Tags: An Agglomerative Hierarchical Clustering Framework

نویسندگان

  • Xiang Li
  • Huaimin Wang
  • Gang Yin
  • Tao Wang
  • Cheng Yang
  • Yue Yu
  • Dengqing Tang
چکیده

By amassing ‘wisdom of the crowd’, social tagging systems draw more and more academic attention in interpreting Internet folk knowledge. In order to uncover their hidden semantics, several researches have attempted to induce an ontology-like taxonomy from tags. As far as we know, these methods all need to compute an overall or relative generality for each tag, which is difficult and error-prone. In this paper, we propose an agglomerative hierarchical clustering framework which relies only on how similar every two tags are. We enhance our framework by integrating it with a topic model to capture thematic correlations among tags. By experimenting on a designated online tagging system, we show that our method can disclose new semantic structures that supplement the output of previous approaches. Finally, we demonstrate the effectiveness of our method with quantitative evaluations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing Conceptual, Divise and Agglomerative Clustering for Learning Taxonomies from Text

The application of clustering methods for automatic taxonomy construction from text requires knowledge about the tradeoff between, (i), their effectiveness (quality of result), (ii), efficiency (run-time behaviour), and, (iii), traceability of the taxonomy construction by the ontology engineer. In this line, we present an original conceptual clustering method based on Formal Concept Analysis fo...

متن کامل

Comparing Conceptual, Divisive and Agglomerative Clustering for Learning Taxonomies from Text

The application of clustering methods for automatic taxonomy construction from text requires knowledge about the tradeoff between, (i), their effectiveness (quality of result), (ii), efficiency (run-time behaviour), and, (iii), traceability of the taxonomy construction by the ontology engineer. In this line, we present an original conceptual clustering method based on Formal Concept Analysis fo...

متن کامل

Hierarchical clustering of word class distributions

We propose an unsupervised approach to POS tagging where first we associate each word type with a probability distribution over word classes using Latent Dirichlet Allocation. Then we create a hierarchical clustering of the word types: we use an agglomerative clustering algorithm where the distance between clusters is defined as the JensenShannon divergence between the probability distributions...

متن کامل

Hierarchical clustering of word class distributions

We propose an unsupervised approach to POS tagging where first we associate each word type with a probability distribution over word classes using Latent Dirichlet Allocation. Then we create a hierarchical clustering of the word types: we use an agglomerative clustering algorithm where the distance between clusters is defined as the JensenShannon divergence between the probability distributions...

متن کامل

A New Agglomerative Hierarchical Clustering Algorithm Implementation based on the Map Reduce Framework

Text clustering is one of the difficult and hot research fields in the text mining research. Combing Map Reduce framework and the neuron initialization method of VPSOM (vector pressing SelfOrganizing Model) algorithm, a new text clustering algorithm is presented. It divides the large text vector dataset into data blocks, each of which then processed in different distributed data node of Map Red...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012